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We investigate the random walk of prices by developing a simple model relating the properties of 
the signs and absolute values of individual price changes to the diffusion rate (volatility) of prices 
at longer time scales. We show that this benchmark model is unable to reproduce the diffusion 
properties of real prices. Specifically, we find that for one hour intervals this model consistently 
over-predicts the volatility of real price series by about 70%, and that this effect becomes stronger 
as the length of the intervals increases. By selectively shuffling some components of the data while 
preserving others we are able to show that this discrepancy is caused by a subtle but long-range 
non-contemporaneous correlation between the signs and sizes of individual returns. We conjecture 
that this is related to the long-memory of transaction signs and the need to enforce market efficiency. 



I. INTRODUCTION 

The random walk was originally introduced in finance 
in 1900 [3J as an empirical model for prices. It is still 
widely used in finance for many practical problems, such 
as option and interest rate pricing. The conceptual justi- 
fication for the random walk description of asset price is 
market efficiency, i.e. that asset price changes should be 
unpredictable [11]. Even if prices do not follow a perfect 
random walk, for many purposes this is an excellent ap- 
proximation: While there may be some structure in the 
drift term, so that occasionally clever arbitrageurs can 
predict and exploit small deviations from randomness, 
basically the direction of price movements is very close 
to random. 

The well-known non-random exception is the diffu- 
sion rate of prices, which in finance is usually called 
the volatility. As first carefully documented by Engle 
[15j . volatility varies in a way that is quite persistent in 
time. Its autocorrelation function dies out slowly with 
an asymptotic power law decay for long times, so that it 
is a long- memory process [8, ,13j HH [SD]- The question of 
what causes variations in volatility is more complicated, 
and remains unanswered. Standard equilibrium theories 
in economics say that volatility should be caused by new 
information [9J, but new information is difficult to mea- 
sure, and while a few recent studies seem to support this 
on longer time scales [21 [12] , there are several studies 
on shorter time scales suggesting that the correlation be- 
tween volatility and news is weak [T^ [THJ [33] . 

An alternative approach is to look for immediate 
causes of volatility. For example, Clark suggested mod- 
eling volatility as a subordinated stochastic process, in 
which the transaction rate varies^ and consequently the 
diffusion rate also varies jlOj . Variations in the transac- 



tion rate correlate with volatility, so this theory is at least 
partially correct [TJ[5J[35]. However, a more recent study 
shows that, at least on a time scale of fifteen minutes, this 
is not the dominant correlate of volatility. Instead, there 
is a much larger effect due to the size of individual price 
changes, which is only weakly correlated with the size 
of transactions and with the transaction frequency |21j . 
The long-memory properties of individual price changes 
also match those of volatility much better than those of 
volume or transaction frequency. 

This story is further complicated by the fact that trans- 
action signs have long-memory [HI [SJ jTHl [55]. Transac- 
tions can be labeled as having a positive sign if they are 
initiated by a buyer, i.e. if the trading order that triggers 
the transaction is from a buyer, and similarly as having a 
negative sign if they are initiated by a seller. Sequences of 
transaction signs have long-memory, i.e. their autocorre- 
lation functions C(r) decay as a power law C(t) ^ r"^ 
with 7 < 1, typically with 7 « 0.5. This strong au- 
tocorrelation structure implies that the signs of trans- 
actions are quite predictable using a trivial algorithm. 
Since buyer initiated transactions tend to push the price 
up and seller initiated transaction tend to push it down, 
this suggests that prices should also be predictable, which 
would contradict market efficiency. To prevent this from 
happening there must be a non-trivial relationship be- 
tween transactions and price responses. 

We add to this story by studying a simple model for the 
aggregation properties of non-zero price returns at the 
level of individual transactions. In particular, we view 
price changes as the steps in a generalized random walk. 
The term generalized random walk refers to the possibil- 
ity that there are correlations in the signs of the steps and 
their sizes. Under the assumption that price changes are 
permanent, we develop a model predicting the expected 
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information, thus this idea neither supports nor contradicts the 
standard theory. 
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volatility in terms of properties of the generalized random 
walk, such as the number of steps, the average step size, 
the variance of the step sizes, the imbalances between 
positive and negative steps, and sums of the autocorrela- 
tion functions for step signs and sizes. Restated in terms 
of prices, the model is based on the number of non-zero 
price changes, the average size of price changes, the vari- 
ance of the size of price changes, the imbalance between 
up and down price movements, and the sum of the au- 
tocorrelation functions of price change sign and size. We 
show that this model performs poorly in describing the 
volatility of real data. We show that the predictions of 
this model for volatility are much too large, by the or- 
der of 70% of the empirical volatility, and that the cause 
of this over-prediction is the relationship between lagged 
signs and sizes of the price changes. 

The paper is organized as follows. In Section [ll] we 
develop a model for the random walk of prices, and in 
Section lllll we describe the data. In Section llVl we test the 
hypotheses of the model, and in Section [V] we present our 



we summarize 



main empirical results. Finally, Section VI " 
the paper, discuss what the results mean, and outline 
future work. 



II. THE GENERALIZED RANDOM WALK 



(z) St is a Bernoulli variable Vt. 



{a) Wt is a strictly positive random variable Vt. 



(in) Both {s(} and {wt} are wide sense stationary pro- 
cesses^. 



(if) {st} and {wt} are independent stochastic processes. 

The first two assumptions are simply the decomposition 
of any single step into its sign and its magnitude. The 
third assumption of stationarity is important because it 
implies that autocorrelation function cx(t,u) of the pro- 
cess X between times t and u only depends only on the 
lag, i.e. cx{t,u) = cx{\t — u\). Note that in making this 
assumption we are also assuming that the first two cen- 
tral moments of the distribution of wt are finite (this is 
automatic for st). The fourth hypothesis greatly simpli- 
fies calculations, since under the independence hypoth- 
esis the joint probability density function of any given 
subset of these variables can be factorized. We will see 
that the first three assumptions are fine, but the fourth 
assumption is not well-satisfied for the real data. 



Price returns are defined in logarithmic terms as rt — 
log pt — logpt_i, where pt is the price at time t. In this 
paper we will take pt to be the midprice, i.e. the average 
of the best quoted buying and seUing prices. We con- 
sider only non-zero returns rt 0, and define the time 
variable t under the transformation t t + I, which 
occurs whenever the midprice changes. That is, except 
where otherwise stated, throughout this paper time is 
just a counter labeling the number of non-zero steps for 
the random walk of price changes. 

An additive stochastic process 



Rn 



t=i 



(1) 



where the increments rt are stationary, defines what we 
will call a generalized random walk. We use this term 
to distinguish it from a "pure" random walk, in which 
the increments rt are independent and identically dis- 
tributed (IID). Our purpose here is to make a model for 
the squared volatility, which we will measure as the vari- 
ance Var{Rn), in terms of the underlying properties of 
a generalized random walk with increments r^. For this 
purpose it is useful to decompose the individual returns 
as rt — StWt, where St is the sign of the return at time t, 
and Wt is its magnitude. 



B. Derivation of formula for volatility 

The squared volatility is the variance of i?„ and can 
be computed as 



Var{R„) EE E[Rl] ~ E[i?„]2 

n n 
E SiWi Sj 

1=1 3=1 



E 



E 

.1=1 



S,Wi 



where E[.] represents the expected value. 

Let the the means of s and w be fig and fiw and the 
variances be and ct^. Since both the s-process and 
the w-process are stationary we can write E[si] — /is, 
E[wi] = pw, and E[wf] — + p"^ for all i. Moreover wt 
and St are independent processes and we can factorize 
E[siWi] = E[si]E[i(;i] and E [siSjWiWj] = E [siSj] E [wiWj]. 
Then wc obtain 



A. Assumptions 

^ A stochastic process {Xt}, where t = represents the integers, is 

To make our analysis tractable we make the following „ide sense stationary (WSS) if (i) E [X^] < oo Vt, (ii) E [Xt] = 
assumptions: /^x Vt, (Hi) ElXtXu] =E[Xt+hXu+h] Vt,u,h. 
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Var{Rn) = 

n n / ^ \ ^ 

i i<j \ i / 

n 

= nWl + M^>] + + Ms] [c^>(j, + 



2 2 2 



where Cs{i,j) and Cu,(i,j) are the autocorrelation func- 
tions of the sign process and the size process. The sum 
in the second term can be written expHcitly as 



n n \ 

i<j 

n n 



Let / :NxN^Mbea generic function of two integer 
variables. If f{i,j) = f{\i — j\) for all i and j, then 



i<j 



1 = 1 



I 



(3) 



Since both st and wt are stationary, Eq. 3] holds both 
for Cs and Cw Then, we can use ^ in (|2pand Eq. ^ 
becomes 



V = 



+ [l + 2a2if,(n)-/.2]^^}, (4) 



where Ks,w{n) = J2?=i (l 



Er=i (1 



and /^^(n) = ELi (1 ~ n) "^^(0 are 



functions of the number of steps n and depend on the 
autocorrelation structures of both signs and sizes. We 
have introduced the notation V to emphasize that we 
will be using this as a prediction for squared volatility. 



III. DATA 

To test the validity of Eq. Q we used data for four 
highly capitalized stocks traded in the London Stock 
Exchange (LSE). The stocks are Astrazeneca (AZN), 
LLoyds Tsb Group (LLOY), Shell Transport & Trading 
Co. (SHEL), and Vodafone Group (VOD). The inves- 
tigated period spans more than two years ranging from 
May 2, 2000 to December 31, 2002, for a total of 675 
trading days. Summary statistics are given in Table |l] 



The total number of non-zero returns in the sample is 
roughly 300, 000 for each of these stocks. There is on av- 
erage about one price change per minute, but the trading 
activity fluctuates considerably and in some periods there 
can be more than a price change per second. 

We have left out the first and last fifteen minutes of 
each trading day. This choice avoids biases due to the 
extremely high activity at market opening and closing. 
If we include data for the full day we get practically the 
same results. Overnight price changes are omitted. We 
have also performed tests removing outliers and found 
that this makes no difference in our results. 



IV. TESTING THE HYPOTHESES OF THE 
MODEL 



Before testing the model we first test the hypotheses 
of the model given in Section |II A[ In particular we check 
the stationarity of both st and Wt, compute their auto- 
correlation functions, and test their independence. 

To test for stationarity we used two standard tests that 
are widely employed in time series analysis, the aug- 
mented Dickey-Fuller test and the Phillips-Perron test 
Both are tests for the null hypothesis that a time 
series Xt has a unit root a, i.e. that under the model 
Xt — axt-i + rjt, where rjt is IID noise, a — 1. We applied 
these tests to the entire time series of signs and sizes. 
For each stock we found that for both St and Wt the null 
hypothesis of non-stationarity (unit root) can be rejected 
with a p-value smaller^ than 2 x 10~^^. We also applied 
these tests to individual days of data and found that the 
null hypothesis of non-stationarity can be rejected with 
a p-value smaller than 0.05 in more than the 95% of the 
days, with essentially the same results for all stocks. We 
therefore conclude that both and Wt can be considered 
stationary processes. 

The wide-sense stationarity hypothesis assumes that 
the first two central moments and cr^ of Wt are finite. 
Since these also appear in equation Q it is particularly 
important to test that this is true. Many studies have 
shown that the probability distribution of price returns 
have power law tails, i.e. that P(|r(| > x) ~ x~°' as 
X — > oo, with < a < oo. This implies that moments 
less than a exist, but moments greater than a are infinite. 
In particular, a > 2 is sufficient to guarantee that 
and 0-^ are both well-defined. Early studies suggested 
that price returns are described by Levy distributions, 
which have a < 2 [171 ES] , but most later studies have 
measured a > 2 [551 HZl |5S]. Just to make sure, we 
estimated the tail index using a Hill estimator [^^ as 



^ Such a strong result is partially due to the high number of data 
points in each sample, but we also due to the fact that the com- 
puted root is always much smaller than one (about 10^^) in both 
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TABLE I: Summary statistics of the stocks in our sample. Sample size, tail index a (Hill estimator) and Hurst exponent H of 
the absolute returns. Note that the quoted significance intervals are standard errors and are much too small. 





AZN 


LLOY 


SHEL 


VOD 


Number of trades xlO'' 


5.5 


5.7 


5.9 


9.4 


Number of non-zero returns x IC 


3.2 


2.7 


2.7 


3.4 


Trades per 15 min. 


23.7 


24.9 


25.4 


43.5 


Non-zero returns per 15 min. 


14.8 


12.4 


12.6 


15.6 


Tail index of Wt (a) 


3.0 ±0.04 


3.2 ±0.05 


3.7 ±0.05 


8.6 ±0.11 


Hurst exponent of wt (H) 


0.80 ±0.006 


0.80 ±0.005 


0.85 ±0.011 


0.86 ±0.014 



presented in Table |lj In every case we find'* that a > 2. 
We conclude that the first and the second moment of the 
absolute return distribution exist. 

We also studied the autocorrelation structure of St and 
Wt- For each stock we estimated autocorrelation func- 
tions from the entire sample. We find that the autocor- 
relation of the absolute returns Wt is a long-memory pro- 
cess, i.e., the autocorrelation function is asymptotically 
a power-law Cu,(t) ~ Cu,(0)i~''' with 7 < 1. In contrast, 
the sign process has some non-zero structure in its auto- 
correlation function, but is not long-memory. 

To give a qualitative feeling for the long-memory na- 
ture of the size process, in Fig. ([TJi) we show Cu,(t) = 
Y^'t=i'^w{t), the cumulative sum of the autocorrelation 
function up to time r, in double logarithmic scale. This 
makes it clear that a power law is a reasonable approxi- 
mation and that the integral is increasing without bound. 
The long- memory nature of a stochastic process can also 
be characterized by the Hurst exponent H, which is re- 
lated to the decay exponent 7 as 7 = 2 — 2H [1] . From a 
statistical point of view, computing the Hurst exponent 
is a more reliable indicator of long-memory than work- 
ing with the autocorrelation function. We estimate H by 
using the Detrended Fluctuation Analysis (DFA) intro- 
duced in 31]. In Table |l] we report the value of the Hurst 
exponents of wt for different stocks^. We see that H is 
always in [0.80,0.86], which implies 7 e [0.28,0.40]. 

In contrast the autocorrelation function of the signs St 
looks completely different. In every case, after at most 
10-15 trades the size of the autocorrelation function be- 
comes small enough to be within statistical error. Even 
though each coefficient is small, there may be trends in 
which nearby coefficients tend to be positive or negative, 
so we have once again computed Cs(r) = J2t=i '^sit), as 
shown in Fig. ^p). We cannot plot this in double loga- 
rithmic scale because of negative values, so we show it in 




10 

T 

(a)Absolute retruiis 



100 



1000 




100 200 300 400 500 600 700 800 900 1000 

T 

(b)Signs 

FIG. 1: Cumulative autocorrelation functions for the four 
investigated stocks. Top: Absolute returns in double loga- 
rithmic scale. Bottom: Signs in linear scale. 



* For Vodafone we find a 8.5, which calls into question whether 
the returns really obey a power law at all. In any case this does 
not matter for our results here. 

^ Given the long-memory of the absolute returns we expect that 
the variance of the absolute return on smaller time scales (e.g. 
15 minutes or 1 hour) will be slightly smaller than the variance 
computed on the entire sample and we indeed observe this. 



linear scale. We see that there are some persistent effects 
involving the accumulation of small terms in the auto- 
correlation function; in some cases Cs{t) takes hundreds 
of transactions to approach its asymptotic value. Nev- 
ertheless, the behavior is dramatically different from the 
long-memory behavior of C^(r), as evidenced by the fact 
that for the signs (7^(1000) < 0.6 in every case, whereas 
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for the sizes (7,^,(1000) > 40, and in some cases is closer 
to 400. 

In conclusion absolute returns are persistent in time, 
in agreement with other studies that have found that 
volatiHty is a long-memory process O [131 EH ■ On 
the contrary signs are weakly autocorrelated, which they 
have to be to be compatible with market efficiency. The 
signs of non-zero returns should not be confused with the 
signs of transactions, which as we have already mentioned 
form a long memory process [SJ|5B]. We say more about 
the possible importance of this in the conclusions. 




FIG. 2: Lagged cross-correlation function of signs and ab- 
solute returns for the stock AZN. The dashed line is the 2a 
standard error. 



Finally we need to test the hypothesis of independence 
between signs and absolute returns. This is not a simple 
task. A first naive approach is to compute the cross- 
autocorrelation function between the two time series, as 
shown for AZN in Fig. For all the stocks in the 

sample the cross-correlations are practically negligible, 
always less than or comparable to the noise level^ at any 
lag. This might suggest that the assumption of inde- 
pendence between signs and sizes is a good approxima- 
tion. However, note the patterns in the autocorrelations 
in Fig. ([2|. This suggests that even if the individual co- 
efficients are small, there may be significant integrated 
effects, and in any case one must also worry about non- 
linear interactions. Later we will show that independence 
of the sign and size is not a good assumption. 



V. ESTIMATING VOLATILITY 

A. Testing the model 

Our goal is to test the validity of Eq. Q . To do this 
we divide the original time series into non-overlapping 
real time intervals of length T — 15 minutes, 1 hour and 
4 hours. We measure the total price return Ri during 
each interval i and use i?f for that interval as a proxy 
for its empirical squared volatility. We compare this to 
the squared volatility prediction Vi based on Eq. Q . For 
each interval i we estimate /is, as, ^iw, ctw, and count the 
number of non-zero returns n^. In contrast, under our 
stationarity assumption Ks{ni) and K^ijii) should not 
depend on which interval we choose. This is fortunate 
because there is not enough data in an individual interval 
to get a statistically stable estimate. Thus we estimate 
them for each stock using data from the entire time series. 
Finally, we compute the ratio 



Pi 



Vi ' 



(5) 



This is computed as the 2(T-error, where cr = 1/ vN and is the 
length of the series. 



If all the assumptions of the model were correct we should 
find that p — 1 to within statistical errors. 

As a reality check and to get a feeling for the expected 
statistical errors, we begin by testing this procedure on 
simulated data that is guaranteed to satisfy the assump- 
tions of Section III Al To make the test as realistic as 
possible we use the AZN original series of signs, and 
we generate a long-memory series of artificially gener- 
ated absolute returns with a Hurst exponent H — 0.7 
using a standard fractional Brownian motion generator. 
This series explicitly differs from the real data in that 
it is log-normally distributed, and the sign and absolute 
return series are guaranteed to be independent of each 
other. We ran the simulation on one hour intervals, i.e. 
we sample the artificial series in non-overlapping sub- 
intervals with same number of non-zero returns as in the 
one-hour sampling of the original AZN series. Our re- 
sults are reported in Fig. ([3| where we show the average 
value of p conditioned on the expected volatility. We 
consistently find p « 0.9. Thus while our derived model 
gives a reasonable approximation, within ten percent of 
the correct answer, the predicted volatility is consistently 
slightly higher than the observed volatility. 

We have performed extensive numerical experiments 
to understand the source of this bias that make it clear 
that the origin of this effect is statistical bias. Because 
these data are highly skewed and display long-memory, 
when the variance is estimated with a finite number of 
data points the estimate of the volatility is systematically 
low. This effect enters both for the empirical volatility 
itself (which we are measuring based on the price change 
across the whole interval, i.e. effectively with one point), 
and for the variance of the sizes of the returns. Thus there 
is some cancellation, but the effect is more severe for the 
volatility, and hence the estimates tend to be low. While 
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we might be able to correct for this effect and improve 
the accuracy of our measurements, this is not trivial and 
the other effects that we observe later in this paper are 
sufficiently large that they dominate. 



1.4 p 

1.3 - 

1.2 - 

1.1 - 



Expected 
Simulation 
Average Values 
Reai Data 



0.9 ^ — T — m — T 

0.8 - 
0.7 - 



r+1 



0.6 _ i 



0.5 - 

0.4 - 
1 
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Expected volatiiity 

FIG. 3: The ratio p of the empirical volatility and the ex- 
pected volatility from Eq. ([5| as a function of the expected 
volatility. The simulated data are shown as squares and the 
real data for AZN as circles. Data are binned on the a;-axis 
using quantiles with 10 bins and 538 intervals per bin. The er- 
ror bars represent standard errors. The weighted mean value 
for the simulation is close to one, whereas the real data is 
closer to 0.6. 

In contrast for real data the expected volatility is al- 
ways significantly larger than the empirical volatility, as 
shown in Fig. dsl for AZN and reported for the other 
stocks in Table ( |ff[ ) . For 1-hour sampling the ratio p for 
AZN is close to 0.6 and for the four stocks in our sample 
is in the range p £ [0.59,0.67]. As shown in Table (|n]) 
this overestimation also holds at all time scales, and gets 
worse as the time scale increases. We conclude that our 
model consistently over-estimates the squared volatility 
by roughly 20% or more at 15-minute time scales and 
more than 67% at four hour time scales. 



B. Shuffling experiments 



signs and absolute returns, but preserves the auto- 
correlation of the signs. 

3. Returns. We shuffle returns, i.e. we shuffle both 
signs and absolute returns together, using the same 
permutation. This destroys the autocorrelation of 
both signs and sizes, and at the same time preserves 
their contemporaneous cross-correlation while de- 
stroying any lagged cross-autocorrelations. 

In each case we measure p just as we did for the real data. 

The results of these experiments are shown in Ta- 
bic (|n|. We see that when we shuffle signs we observe 
p 0.93 fairly consistently for each stock. This is smaller 
than one, but much larger than the p observed for real 
data. This result is consistent with the bias we observed 
earlier in our benchmark simulating long-range correlated 
absolute returns. However, this effect is much too small 
to explain the large discrepancy with the real data - there 
must be another, much larger effect in the real series of 
absolute returns. 

In contrast, when we shuffle the absolute returns or 
the returns, we observe p « 1 in almost every case*" From 
this we conclude that the underestimation we observe is 
neither caused by the autocorrelation of signs nor is it 
caused by the contemporaneous cross correlation of signs 
and absolute returns. 

These experiments suggest that the main cause of the 
over-estimation of volatility is the lagged relationship be- 
tween signs and absolute returns^. To test this more ex- 
plicitly we perform block shuffling experiments, in which 
we randomly interchange the order of blocks of length L 
while leaving everything the same within each block. We 
performed two different tests: 

1. Blocks of signs and sizes separately. We shuffle 
blocks of signs and absolute returns separately. 
This preserves the individual autocorrelation struc- 
tures up to the block size, but destroys any cross 
correlation between the signs and the absolute re- 
turns^ . 

2. Blocks of returns. We shuffle blocks of returns, 
keeping the same ordering of signs and sizes within 



To understand why the model of Eq. Q fails we per- 
form a series of shuffling experiments at the one hour 
time scale. In each case we randomly rearrange the or- 
der of a given component of the real data while preserving 
everything else. 

1. Signs. We randomly shuffle the sign time series. 
This destroys the autocorrelation of the signs and 
any cross-correlation between signs and absolute re- 
turns, but preserves the autocorrelation of the ab- 
solute returns. 

2. Absolute returns. We shuffle the absolute return 
time series. This destroys the autocorrelation of 
absolute returns and any cross-correlation between 



^ The exception is when we shuffle absolute returns for Vodafone 
we observe p = 0.91 it 0.05. We don't know whether this implies 
that our error bars (based on standard errors) are too optimistic 
or whether there is some effect that makes Vodafone different 
from the other stocks. 

* This result is in line with that obtained by Weber 1341 , who no- 
ticed that the size of the largest price changes is over-estimated 
if one assumes that signs and absolute returns are independent. 
Moreover, this result is in some way similar to the leverage 
effect 0, even if it is usually observed on daily or even weekly 
time scales, while our result holds at individual transaction time 
scale. 

^ When we shuffle blocks of signs and absolute returns separately 
we use different block boundaries for each. Thus a sign for a given 
return is typically matched with a different absolute return. 
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TABLE II: Ratio of expected volatility to empirical volatility p for real data and shufHing experiments. 





AZN 


LLOY 


SHEL 


VOD 


Real data 


15 min. 


0.75 ±0.04 


0.84 ±0.05 


0.79 ±0.06 


0.71 ±0.03 


1 hour 


0.58 ±0.01 


0.66 ±0.01 


0.63 ±0.01 


0.61 ±0.01 


4 hours 


0.55 ±0.02 


0.57 ±0.03 


0.57 ±0.04 


0.59 ±0.03 


Shuffling (1 h.) 


Signs 


0.92 ±0.01 


0.94 ± 0.02 


0.93 ±0.02 


0.93 ±0.02 


Absolute returns 


0.97 ±0.02 


1.00 ±0.03 


0.97 ±0.02 


0.91 ±0.05 


Returns 


1.02 ±0.02 


1.01 ± 0.02 


1.02 ±0.02 


1.02 ±0.03 


Block 

shuffling (1 h.) 


Signs and Sizes 


0.92 ±0.02 


0.95 ±0.02 


0.96 ±0.02 


0.93 ±0.02 


Returns 


0.68 ±0.02 


0.75 ±0.02 


0.71 ±0.01 


0.73 ±0.03 



the block. This preserves all the autocorrelations 
and all the lagged cross correlations between signs 
and absolute returns up to the size of the blocks. 

The results for blocks of length 60 are summarized for 
AZN in Fig. Q and for all stocks in Table |TT] In the 
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for Eq. Q to overestimate volatility and for the proxy 
i?^ to underestimate volatility, our simulations show that 
this effect is small and is not sufficient to explain the 
discrepancy observed in the real data. This once again 
suggests that preserving the relationship between signs 
and sizes is important. 

In contrast, when we test this directly by shuffling 
blocks of returns while preserving the relationship be- 
tween signs and sizes, we find a significant overestimation 
of the volatility with p ~ 0.7. This value is significantly 
smaller than one, making it clear that we have captured 
most of the effect, but it is still larger than the value 
p ^ 0.6 that we observed for the real data. We believe 
that this is because the block length L = 60 is not long 
enough. To test whether this is the case in Fig. (|5| we 
plot the estimated value of p for a block return shuffling 
experiment for each stock as a function of block length. 
As for our previous experiments with L = 1, we observe 



Expected volatility 

FIG. 4: Block shuffling experiments for AZN. We com- 
pare shuffling blocks of returns to shuffling blocks of absolute 
returns and signs separately using blocks of length L = 60. 
Data are binned along the a;-axis based on their expected 
volatility in 10 bins with 538 one hour intervals per bin. The 
ratio p plotted on the vertical axis indicates whether Eq. Q 
correctly predicts the volatility for the shuffled data sets in 
each range of the expected volatility; error bars are standard 
errors. A horizontal black line at y = 1 is shown for compar- 
ison. Circles are for shuffling blocks of returns, triangles for 
shuffling signs and absolute values separately, and the dashed 
lines are the mean values of each. Shuffling signs and abso- 
lute returns separately destroys their lagged cross-correlation, 
and results in correct estimates, while shuffling returns pro- 
duces a similar over-estimation to that observed for the real 
data. This supports our hypothesis that a subtle correlation 
between absolute returns and signs causes the overestimation 
for real data. 

experiments where we shuffle blocks of signs and sizes 
we find p e [0.92,0.96]. As we previously observed 
when shuffling signs alone with block lengths of one, p is 
slightly smaller than one, consistent with simulations re- 
ported in Sec. V A Thus, even if there is a small tendency 




10 100 
L - block length 



1000 



FIG. 5: Dependence of p on block size for return shuf- 
fling experiments for one hour intervals. In the original 
time series blocks of length L are shuffled, preserving the or- 
dering of signs and absolute returns within each block. The 
ratio p, which measures the amount by which Eq. Q over- 
estimates volatility, is plotted as a function of the block size L 
for each of the four stocks in our sample. This shows that as 
the block size increases the estimates decrease fairly steadily 
toward the observed values for the real data. 
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p « 1. As L increases p decreases fairly steadily in ev- 
ery case. However, this decrease is fairly slow, and at 
the maximum block size length L — 600 it has still not 
decreased to the low value p ^ 0.6 observed for the full 
sample. We believe this is because the maximum block 
length, which is limited by our ability to obtain good sta- 
tistical sampling, is still not long enough. This suggests 
the time scale of the cross-correlations between signs and 
absolute returns is very long. 



VI. DISCUSSION AND CONCLUSIONS 



Under the assumptions given in Section II A we have 
derived a formula for volatility under a simple general- 
ized random walk model. For a time interval of any given 
length, this formula relates volatility to simple properties 
of the underlying random walk, in particular the number 
of non-zero returns, and the mean and variance of the 
signs and absolute values of returns, as well as their in- 
tegrated autocorrelation. 

We find that this formula consistently overestimates 
volatility. We have shown that the main reason for this is 
because our formula assumes that return signs and return 
sizes are independent. In contrast, for the real data there 
are long-range correlations, which are small at any given 
time lag but large when integrated over long time scales. 
This effect is quite large: The overestimate is roughly 
67% for one hour intervals, and even more for four hour 
intervals. 

These results are surprising because they indicate that 
the volatility is reduced by almost a half due to a subtle 
long range interaction between the signs of returns and 
their sizes. This is particularly surprising because it in- 
volves signs of returns and not the signs of transactions. 
The signs of transactions form a long-memory process 
while the signs of returns do not. Thus the evidence 
seems to indicate that there is a very-long range interac- 
tion between return signs and sizes, even though return 
signs themselves do not show long- memory properties. 



We believe that this interaction is closely related to the 
interaction that takes place between the transaction signs 
and returns as studied in references [5J |5J [TSJ , but at 
this point we have not been able to show this. Intuitively 
this can be seen as follows: Because of the long-memory 
properties of transactions, which make their signs highly 
predictable, returns must compensate so that they are 
not equally predictable. One way to make this happen, 
as stressed by Bouchaud et al., is that price impacts are 
temporary, i.e. when transactions happen prices change 
but this change decays slowly with time. Alternatively, 
as stressed by Lillo and Farmer, price changes can have a 
permanent component, but this component varies based 
on the predictability of transaction signs: When a future 
transaction is very likely to be a buy, the size of buy re- 
turns is much larger than the size of sell returns. These 
two approaches have been shown to be equivalent [20] . 
In either case it suggests a reduction of volatility relative 
to what one would expect under an unconditional per- 
manent impact model such as the one we have developed 
here. In a future paper we hope to show that this is in- 
deed the reason for the missing component of volatility, 
or alternatively provide a better explanation. 
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